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When an interval of integers between the lower bou nd U and th e uppe r bound m is 



the support of the marginal distribution m\(ni-\, . . . ,ni), IChen et al.l (|2005al ) noticed that 
sampling from the interval at each step, for rii during a sequential importance sampling 
(SIS) procedure, always produces a table which satisfies the marginal constraints. However, 
in general, the interval may not be equal to the support of the marginal distribution. In this 
case, the SIS proced ure may produce tables which do not satisfy the marginal constraints, 



leading to rejection ( Chen et al. . 20061 ). In this paper we consider the uniform distribution 



as the target distribution. First we show that if we fix the number of rows and columns of 
the design matrix of the model for contingency tables then there exists a polynomial time 
algorithm in terms of the input size to sample a table from the set of all tables satisfying all 
marginals defined by the given model via the SIS procedure without rejection. We then show 
experimentally that in general the SIS procedure may have large rejection rat es even with 



small tables. Further we show that in general the classical SIS procedure in (jChen et all 



1 



2005a| ) can have a large rejection rate whose limit is one. When estimating the number of 
tables in our simulation study, we used the univariate and bivariate logistic regression models 
since under this model the SIS procedure seems to have higher rate of rejections even with 
small tables. 

Keywords: Contingency tables, estimating number of tables, generating functions, semigroups, 
sequential importance sampling. 



1 Introduction 



Sampling from two-way and multiway contingency tables has a wide range of applications such as 
computing exact p- values of good ness-of-fit, estimat i ng the number of contingency tables satisfying 
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1992J). For some problems, such as sparse tables, the data of interest 



does not permit the use of asymptotic methods. In su ch cases, one can apply M onte Carlo 
Markov Chain (MCMC) procedures using Markov bases (jDiaconis fc Sturmfelsl . Il998l ). In order 
to run MCMC over the state space, all states must be connected via a Markov chain. A Markov 
basis is a set of moves on all contingency ta bles (the state space) guaranteed to be connected via a 



Markov chain ( Diaconis &: Sturmfels 



19981 ) . One important quality of a Markov basis is that the 



moves will work for any marginal sums under a fixed model. The two major advantages to using 
a MCMC approach, if a Markov basis is already known, is that it is easy to program, and it is not 
memory intensive. However MCMC methods are not without drawbacks where one bottleneck is 
the computation of a M arkov basis. In fact, for 3-way contingency tables with fixed 2-margins, 
De Loera Sz Onnl (120051 ) showed that the number of Markov basis elements can be a rbitrary. To 
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try to ci rcumvent the di f ficulty of computing a Markov basis which may be large 
( 12005bl ); iBunea &: Besagl ( 120001 ) studied computing a smaller set of moves by allowing entries of 
the contingency table to be negative. The trade off to this approach is longer running time of the 
Markov chains. Even using a standard MCMC approach, to sample a table independently from 
the distribution, the Markov chains can take a long time to converge to a stationary distribution 
in order to satisfy the independent assumption. Lastly, it is not clear in general how long the 
chain must be run to converge. 

A sequential importance sampling (SIS) procedure is easy to impleme nt and was first app lied 
to sampling two-way contingency tables under the independence model in ( iChen et all l2005al ). It 
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proceeds by simply sampling cell entries of the contingency table sequentially such that the final 
distribution approximates the target distribution. This method will terminate at the last cell and 
sample independently and identically distributed (iid) tables from the proposal distribution. Thus 
the SIS procedure does not require expensive or prohibitive pre-computations, as is the case of 
computing a Markov basis for a MCMC approach. Second, when attempting to sample a single 
table, the SIS procedure is guaranteed to sample a table from the distribution, where in an MCMC 
approach the chain may require a long time to run in order to satisfy the independent condition. In 
these regards, the SIS overcomes the disadvantages of MCMC but presents a new set of problems. 
One major difficulty is computing the marginal distribution of each cell. Typically an interval from 
which the support of the marginal distribution lies i s computed using Integer Programming (IP), 
Linear Programming (LP), or the Shuttle Algorithm (IDobra fc Fienbergj . l2010[ ). When the support 
of the marginal distribution does not equal the interval, the SIS procedure may reject the partially 
sampled table. The SIS has been successful on two-way contingency tables due to the fact that 
the computed interval often equals the support of the marginal distribution. For example, under 
the independence model, the SI S procedure will a lways produce tables satisfying the marginal 
sums, i .e. there are no reje ctions (IChen et all 120061 ). Moreover, for zero-one two-way contingency 



tables ( IChen et all l2005af ) provided an algorithm to sample using the SIS with Conditional 



oisson 



distributions which also avoids rejections and relies on the Gale-Ryser Theorem. IChen et al.l (120061 ) 
extended the SIS to multiway contingency tables, and gave excellent algebraic interpretations 
of precisely when an interval will equal the support of the marginal distribution. Regardless, 
until now, one of the major disadvantages of the SIS is the fact that rejections lead to increased 
computational time. 

In this paper we focus on the uniform distribution as the target distribution. Here we show 



that if we fix the number of rows and columns of the design matrix defined in Subsection 12.11 
then there exists a polynomial time algorithm in terms of the input size to sample a table from 
the set of all tables satisfying all given marginals via the SIS procedure without rejection. For 
the proof we use the notion of the semigroup (defined in Subsection 12. 3ft . of the rays of the design 



matrix anc 
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short g enerating functions of a set. Then we show that the classical SIS procedure in 
(j2005al ) can have a large rejection rate whose limit is one, i.e., it can be very close to 



one. 
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In order to assess the rejection rate we conduct a simulation study on the univariate and 
bivariate logistic regression models. In our simulation study we show that in general the SIS 
procedure can have a high rejection rate even with small tables. 

Notation is covered in Subsection 12. II and the SIS procedure is described in detail in Subsection 
12. 21 The semigroup of the design matrices is discussed in Subsection 12. 31 In Section [3] we will 
show that if we fix the number of rows and columns of the design matrix of the given model for 
contingency tables then there exists a polynomial time algorithm in terms of input size to sample 
a table from the set of all tables satisfying the marginals defined by the given model. In Section 
H]we show by an example that a classical SIS procedure can have a large rejection rate in general 
and then we show experimental results with the SIS in order to assess the rejection rate under the 
univariate/bivariate logistic regression models. 



2 Preliminaries 

2.1 Basic notation 

Let n be a contingency table with k cells. In order to simplify the notation, we denote by 
X = {1, . . . , k} the sample space of contingency tables. 

Let Z + be the set of nonnegative integers, i.e., Z + = {0, 1,2,.. .} and let Z be the set of all 
integers, i.e., Z = {. . . , —2, —1, 0, 1, 2, . . .}. Without loss of generality, in this paper, we represent 
a table by a vector of counts n = (ni, . . . , n^). With this point of view, a contingency table n can 
be regarded as a function n : X — > Z + , and it can also be viewed as a vector n e Z^ . 

The fiber of an observed table n b s with respect to a function T : Z^ — > 7L d + is the set 

.Mn obs ) = {n | n G Z k + , T(n) = T(n obs )} . (1) 

When the dependence on the specific observed table is irrelevant, we will write simply Tt instead 
of JrCiiobs). 

In a mathematical statistics framework, the function T is usually the minimal sufficient statistic 
of some statistical model and the usefulness of enumeration of the fiber J-Wn b s ) follows from 



classical theorems such as the Rao-Blackwell theorem, see e.g. f lShao 



1998|) 



When the function T is linear, it can be extended in a natural way to a homomorphism from 
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M fc to M. d . The function T is represented by an d x fc-matrix At, and its element At(£, h) is 

A T (£,h)=T t (h), (2) 

where is the £-th component of the function T. The matrix At is called the design matrix of 
the model T. In terms of the matrix A?, the fiber Tt can be easily rewritten in the form: 

F T = {n | n £ Z k + , A T n = A T n ohs } . (3) 

When the context is clear, we will simply write A instead of At- 

Remark 2.1. Since the cell counts of a contingency table are nonnegative integers and usually 
the sufficient statistics are a set of marginals of cell counts, the design matrix A of a model T is 
a nonnegative d x k matrix. 



2.2 Sequential importance sampling 



Let Tt be the set of all tables satisfying marginal conditions (for example, under the independence 
model, all tables satisfying given row and column sums). Let p(n), for any n £ Tt, be the uniform 
distribution over Tt, so p(n) = l/lJ 7 ^!- Let q(-) be a trial distribution such that g(n) > for all 
n £ Tt- Then we have 



E 
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q(n) 



\F 



T\ 



Thus we can estimate \Ft\ by 



1 N 1 

iV ^ g(n i ) 



where ni, . . . , are tables drawn iid from q(n). Here, this proposed distribution q(n) is the 
distribution (approximate) of tables sampled via the SIS. 

If we vectorize the table n = (ni, ■ • • , n^) then by the multiplication rule we have 

q(n = (m, • ■ ■ ,n k )j = g(ni)g(n 2 |ni)g(n 3 |n 2 , ni) • - ■ • • • ,ni). 

Since we sample each cell count of a table from a interval we can easily compute g(nj|nj_i, . . . , ni) 
for i = 2, 3, . . . , k. 

When an interval of integers between the lower bound /; and the u pper bound itj is the support 
of the marginal distribution ni\(nn, . . . , ni) for n^, IChen et al.l ( 120061 ) noticed that one can sample 



a value from the interval at each step for m from the interval [h,Ui] and this procedure always 
produces a table which satisfies the marginal constraints. Therefore if we can obtain Zj and 
Ui for each n, sequentially we can apply the SIS. Usually we obtain Zj and it, for each rti by 
Integer Programming (IP) to obtain tight bounds, namely we solve the linear integer programming 
problem for the lower bound: 

min Xi 

s.t. BLiXi H h a fc x fc = b - (aiz* H h Zi-ix*^) , (4) 

Xi, . . . X}~ G 

where integers already sampled by the SIS, b = An b s , and a^ is the jth column of 

A. To compute the upper bound via IP we set max instead of min in Equati on flU). One can ap 



proxi mate these bounds by linear programming (LP) or the Shuttle Algorithm f lBuzzigoli fc Giustil . 



19981 1 . however they might n ot give tight boun ds. 



Based on this observation 



Chen et al. 



( 120061 ) developed a sequential importance sampling (SIS) 



method to sample a table from Tt- The outline of the SIS procedure is the following: 
Algorithm 2.2. [Sequential importance sampling procedure] 

1. For i = 1, . . . , k do: 

(a) Compute Zj and by solving an integer programming problem fll]). 

(b) Sample an integer x* from the interval [li,ui\ according to the distribution q. 

2. Return the table x* = (x{, . . . ,x* k ). 

Remark 2.3. If we want to estimate \ J~t\ then q is the uniform distribution over [Zj,Mj] HZ. Thus 
we sample x* from [Zj, uj D Z with a probability l/(uj — Z.; + 1). 

When we have rejections, this means that we are sampling tables from a bigger set J 7 ^ such 
that Tt C T t . In this case, as long as the conditional probability q^^n^x, . . . , n±) for i — 2, 3, . . . 
and q(ni) are normalized, g(n) is normalized over T T since 

Enexf 9( n ) = Em,...n h Q( n M n 2 K)g(n 3 |n 2 , m) • • • q(n k \n k - u . . . , m) 
= E ni ?( n i) [En 2 ^KI n 2) [•■• [J2n k <l( n k\ n k-i, ■ ■ ■ ,ni)]]] 
= 1. 
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Thus we have 

where I ne jr T is an indicator function for the set J-y. By the law of large numbers this estimator is 
unbiased. 

2.3 Semigroup 

Consider the following system of linear equations and inequalities: 

Ax = b, x > 0, (5) 

where A G Z dxk and b G Z d . Suppose the solution set {x G M. k : Ax = b, x > 0} ^ 0. 

Note that there exists an integral solution for the system in (jSJ) if and only if b is in the 
semigroup generated by the column vectors a 1; . . . ,a. k of A, that is, the set of all nonnegative 
integer combinations of the columns of A, namely 

Q = Q(A) = {a^i H h a. k x k \xt,...,x k e Z + } . (6) 

Let K = K(A) be the cone generated by the columns ai, . . . , a^ of A, that is: 

K = K(A) = {ai^i H h a. k x k \x ± ,...,x k e R + } . 

The lattice L = L(A) generated by the columns ai, . . . , a& of A is: 

L = L(A) = {a.xXi + h a„x„, | a?i, . . . , x k G Z} . 

The semigroup Q sa t = K (1L is called the saturation of the semigroup Q- It follows that Q C Q sa t 
and we call Q saturated if Q = Q sat (this also is called normal). We define H = Q sat \ Q the set 
of holes of the semigroup Q. 

Remark 2.4. If b G if C Qsat? t/ien t/ie system 

Ax = b,x > 0,x G M fc 
/ias a feasible solution. However, the system 

Ax = b,x > 0,x G Z fe 

does noi /jai>e a feasible solution. 
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3 Time complexity of the SIS procedure without rejection 



In 2002. iBarvinok &: Woodsl (120031 ) introduced an algorithm to encode all integral vectors b G Z fc 
in a semigroup Q(A) as a short rational generating function in polynomial time in terms of input 
size when d and k are fixed (Lemma 13.21 stated below). 

One might a sk the time complexity of the SIS procedure without rejection in general. U sing 



the results from (IBarvinok 



1994 



Barvinok fc Pommersheim 



1999 



Barvinok fc Woods 



20031 ). we 



can prove that the SIS procedure without rejection can be solved in polynomial time in fixed d and 
k (Theorem I3.ip . In order to prove the theorem, we will use the multivariate generating function 
of a set X C Z d , /(X; x). Namely, if X C Z d , we define the generating function 



f(X;x) = J2 



x 



sex 



where x s denotes xl 1 ■ ■ ■ x s d d with s — (si, . . . , s^). If X = P D 1 d with fixe d d, where P is a rational 



Barvinok! (Il994f ) and 



Barvinok fc Woods 



convex polyhedron, or if X = Q with fixed d and k, then 

( 120031 ). respectively, showed that f(X;x) can be written in the form of a polynomial-size sum of 
rational functions of the form: 



f(X;x) = Y,lr 



X 



iei 



n 



(?) 



Herein, / is a finite (polynomial size) index set and all the appearing data ji G Q and aj, G Z d 
is of size polynomial. If a rational generating function f(X; x) is polynomial size in the total bit 
size of inputs, then f(X; x) is called a short rational generating function. As an example, if P is 



the one- dimensional polytope [0, N], N G Z + , then /(PnZ; x) 
can be represented by a short rational generating function (1 



1 + 2 + or 



.r 



f(PnZ;x) 



x 



N+l 



)/(l-x). 



Theorem 3.1. Suppose we fix d and k. Assume that the design matrix for the model is A > 0. 
There is a polynomial time algorithm in terms of the input size to sample a table n via the SIS 
procedure without rejection from the set \Ft\ where Tt is defined in (J3]). 



B efore proofs of Theorem EL 



and (IBarvinok fc Pommersheim 



we w ould like to state lemmas from (IBarvinok &: Woodsl . 120031 ) 



19991). 



Lemma 3.2 ((7.3) in (IBarvinok fc Woodsl . 120031 1). Suppose we fix d and k. Let Q = Q(A). Then 
there exists an integer s = s(d) and the generating function f(Q; x) for the semigroup Q can be 



S 



computed in polynomial time in terms of the input size as a short rational generating function in 
the form of 

/(S;s) = X> 



00 



iei 



where 7, G Q, Ui, Vij G Z d and Vij 7^ /or a// 



Lemma 3.3 (Theorem 4.4 in (IBarvinok fc Pommersheiml . Il999l )). Suppose we fix d and suppose 
P cK d is a rational convex polyhedron. Then the generating function f(PnZ d ; x) can be computed 
in polynomial time in terms of the input size as a short rational generating function in the form 

of m. 



The following lemma is an extension of Theorem 3.6 in ( iBarvinok fc Woods! . 120031 ). 



Lemma 3.4. Suppose we fix d. Let Si,S 2 C M. d such that Si is unbounded and S 2 is finite. Suppose 
there exists a vector I G M. d such that (I, x) < for all x G S±, where (•, •) denotes a dot product 
of vectors, then there exists a polynomial time algorithm which given fi{Si,x) and /^(S^a;), the 
short generating function for Si and S2, respectively, computes f(S; x) for S = Si nS% in the form 

where s < 2 • d, % G Q, ttj, G Z d and 7^ for all 



f(S;x) = ^ZlijY 



x 



Vil 



X L 



Proof. Since there exists a vector I G 



for Lemma 3.4 in (IBarvinok fc Woods 



such that (I, x) < for all x G Si, we can apply the proof 



20031 ) to prove this lemma. 



□ 



Remark 3.5. Similarly we can apply the same way to prove that if we fix d and Si, 5*2 C M. d 
are the two unbounded sets such that there exists a vector I G M. d such that (I, x) < for all 
x G Si and (I, x) < for all x G S2, then there exists a polynomial time algorithm which given 
fi(Si;x) and f 2 (S2]x), the short generating function for Si and S2, respectively, computes f(S;x) 
for S = Si H S2 in the form 

x Ui 



X Vt3 ) 



where s < 2 • d, ji G Q, Ui, vu G Z d and 7^ for all 



Using the generating function we have the following algorithm to sample a table n from Tt 
via the SIS procedure without rejection in general. 
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Algorithm 3.6. Input System 

Ax = b,xeZ k + , (8) 

where A = (ai,...,afe) > is a d x k matrix and b = (bi,...,ba) = Arn b s , where n Q b s is the 
observed table, is a d dimensional vector. 

Output n = (jii, . . . , Hk) sampled via the SIS without rejection and its probability to be picked 
from Tt- 

Algorithm 

1. Setp= 1. 

2. For i — 1, . . . ,k — d do 

(a) Set the system A'x — b',xE Z+~ t+1 \ where b' = b — (X^-=i ^j x j) ant ^ ^ = ( a «' •••; a fc)- 

(b) Compute the semigroup Q' generated by (ai+i, via generating function and com- 
pute the generating function for P fl Z d where P = {x G M. d : b' — xa» > 0}. 

(c) Since a £ Z+ \ {0}, Va G Q 1 , we choose I = (— 1, —1, • ■ ■ , —1) to compute Q' fl P via 
generating functions. 

(d) Sample uniformly from Q' n P. 

(e) p = p-(l/#\Q'nP\). 

3. Find nk-d+i, ■ ■ ■ , Uk by solving the system: 

AfinalX = b final, X G Z+, 

where b fina i = b- {Ylj=i A i x i) and A finai = (afc-d+i, a k ) 
4- Return n and its probability to be picked p. 
Lemma 3.7. Q' fl P in StepW^ is not empty. 

Proof. The right hand side b = Arn Q b s from the input is computed from the observed table n Q b s so 
there exists at least one solution in the system (jSJ). Thus b G Q(A). Thus Q' (IP ^ for i = 1. For 
i — 2, . . . (k—d), since b' G Q(A') where Q(A') is the semigroup generated by a,, ... , a^, there exists 
a nonnegative integral solution for the system in A'x — x G Z+~ ;i+1 ' ) . Thus Q' fl P 7^ 0. □ 
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Remark 3.8. The system in Stepffi has a solution as well since b' G Q(A'). 

Proof for Theorem \3.1[ We fix d and k. For Step [2b] we can compute the generating functions 
fi(Q'; x) and /2(-P; x) in polynomial time in terms of input size if we fix d and k by Lemmas 13.21 
and 13.31 One can compute the short generating function for Q' D P in Step [2c] in polynomial time 
by Lemma [37^1 For Step I2dl o ne ca n sample a point from Q' (IP in polynomial time via the method 



of sampling showed in (IPakl . 120001 ) since Q' fl P C P. One can compute D P\ in polynomial 



time by using the Taylor expansion of the limit of the short generat ing; function with x around 



(1, 1. . . . , 1) similarly to how we count the number of lattice points in f lBarvinok Sz Pommersheiml . 



19991 ). Q' fl P ^ by Lemma [3T71 and Remark [3.81 so we do not reject the sampled table. □ 



Remark 3.9. Note that implementing Alaorithm \3.6i seems not to be practical since it seems not 
to be possible in practice at this moment to implement an algorithm in Theorem \3.2\ to compute 
the generating function for a semigroup in polynomial time with fixed d and k even though we have 
a computational time result. 



4 Computational experiments 



In general the classical SIS procedure in ([Chen et al 



2005al ) can have an arbitrary large rejection 



rate. For example, if we have the following system 

xi + a ■ x-i — a + 1, x±, x<i G Z+, 

where a G N is an arbitrary large positive integer. Then by integer programming we have l\ — 
and u\ = a + 1. However, only two integral points in the interval [0, a + 1] for x\ can give a 
solution to the system, namely X\ — 1 or X\ — a + 1. Thus the rejection rate is 1 — = -^p^- 
Thus if we send a — > oo, the rejection rate becomes 1. 

Thus it is interesting to assess the rejection rates in practice. In this section we conduct a 
simulation study under the univariate/bivariate logistic regression models. We chose this model 
since the SIS procedure seems to have a very high rejection rate even with small tables. 

We used two types of random table generators: 

1. A table generator using the Poisson distribution. 
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2. A table generator using the uniform distribution. 
First we show how we generated random tables for Item [U 

• Input: A positive integer k > 1 and a positive rational number A. 

• Output: A randomly generated table n = (n 1; . . . , n*.). 

1. For each cell rii of the table X, pick a random number r uniformly from (0, 1). 

— if < r < Jr, pick a random integer x uniformly from (1, 1000) and assign rii = x, 

— else if < r < ~, assign rii = 1 + Poisson(X), where Poisson(X) is the Poisson 
distribution with a random variable A, 

— else assign rii = 0. 

2. If there is no cell in the table such that rii > 10, then we randomly pick cell rii uniformly. 
Pick a random integer x uniformly from (1, 1000) and assign rii = x. 

3. Return n. 

Now we show how we generated random tables for Item [2j 

• Input: A positive integer k > 1. 

• Output: A randomly generated table n = (ni, . . . , n^). 

1. For each cell rii of the table n, pick a random number r uniformly from (0, 1). 

— if < r < pick a random integer x uniformly from (1, 1000) and assign rii = 

— else if < r < |, pick a random integer y uniformly from (1, 10) and assign rii — V, 

— else assign rii — 0. 

2. If there is no cell in the table such that rii > 10, then we randomly pick cell rii uniformly. 
Pick a random integer x uniformly from (1, 1000) and assign rii = %■ 

3. Return X. 

In this section we consider the bivariate regression model since we verified that the semigroup 
is not normal in each experiment, i.e., there would be rejections with the SIS procedure. Further, 
the problems are small enough so that we can count the exact number of tables. 

12 



Let {1, ...,/} and {1, . . . , J} be the set levels of two covariates. Let Xuj and X2ij, i = 1, . . . , I, 
j — 1, . . . , J, be the numbers of successes and failures, respectively, for level The probability 

for success puj is modeled as 

logit(piy) = log (t^—\ =fi + m + (3j, (9) 

1 = 1,...,/, j = l,...,J. 

The likelihood is written as 

i j 

L(a, A 7 )«nil( 1 + ex P(« + ^ + 7J'))~ re+ " 

i=l j=l 

I J 

x n n exp ( an ^j + P in ^ + 7jf"Kj) 

i=l j=l 

i j 



I an 1++ + (3 ^ in u+ + 7^J n i+J I 

V i=l j=l J 



i=l 3=1 

x exp 



Thus the sufficient statistics for this model are Xi ++ , Y^l=i Sj=ii^i+i> ^+»i> ^> 3- 

In our simulation study on estimating the number of tables via the SIS, we generated 100 
tables under Model for each experiment with sample size N = 100. 



One can find our software at http://polytopes.net/code/SIS 



5 Discussion and open problems 

In this paper we showed that if we fix the number of rows and number of columns of the design 
matrix we can sample a table from the set of all tables given the marginals via the SIS procedure 
without rejection in polynomial time in terms of input size. Also we show that in general the 
rejection rate can be arbitrary large. However we have not been able to implement Algorithm 13.61 
because at present it is very hard to implement a method in Theorem 13 .21 It would be interesting 
to develop a polynomial time algorithm to sample a table from the set of all tables satisfying all 
marginal conditions in polynomial time with fixed dimensions. 
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option 


model 


I,J 


time (sec) 


reject 


1 


univariate 


5 


6.72 


14 






6 


8.84 


14 






7 


11.14 


27 






8 


11.56 


27 






9 


12.62 


30 






10 


14.78 


30 




Bivariate 


2,5 


14.82 


28 






2,6 


18.32 


30 






2,7 


21.9 


42 


2 


univariate 


5 


6.28 


8 






6 


7.4 


14 






7 


9.04 


20 






8 


19.8 


28 






9 


13.51 


31 






10 


15.38 


27 




Bivariate 


2,5 


14.88 


27 






2,6 


18.33 


30 






2,7 


21.59 


30 



Table 1: In this experiment we set the sample size for each SIS procedure ./V = 100 and we generated 
100 random tables. The first column represents the distribution of random table generator. For Item [TJ 
we generated a random table according to the distribution with A = 1. In this table the lower and upper 
bounds for each cell for sampling a table were computed by IP. The second column represents the model. 
Univariate means the univariate logistic regression model with label /. Bivariate means the bivariate 
logistic regression model with label I and J. The third column represents the level(s) of each variable. 
The fourth columns show the computational time in seconds. The last column shows the number of 
random tables out of 100 tables on which we had rejections. 
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